Trend of Gastric Cancer after Bayesian Correction of Misclassification Error in Neighboring Provinces of Iran

Background: Some errors may occur in the disease registry system. One of them is misclassification error in cancer registration. It occurs because some of the patients from deprived provinces travel to their adjacent provinces to receive better healthcare without mentioning their permanent residence. The aim of this study was to re-estimate the incidence of gastric cancer using the Bayesian correction for misclassification across Iranian provinces. Materials and Methods: Data of gastric cancer incidence were adapted from the Iranian national cancer registration reports from 2004 to 2008. Bayesian analysis was performed to estimate the misclassification rate with a beta prior distribution for misclassification parameter. Parameters of beta distribution were selected according to the expected coverage of new cancer cases in each medical university of the country. Results: There was a remarkable misclassification with reference to the registration of cancer cases across the provinces of the country. The average estimated misclassification rate was between 15% and 68%, and higher rates were estimated for more deprived provinces. Conclusion: Misclassification error reduces the accuracy of the registry data, in turn causing underestimation and overestimation in the assessment of the risk of cancer in different areas. In conclusion, correcting the regional misclassification in cancer registry data is essential for discerning high-risk regions and making plans for cancer control and prevention.


Introduction
G astric cancer is the fourth most prevalent form of malignancy accounting for 8% of all new cases (989,600 diagnoses) [1].It is the most prevalent form of cancer in men and the third most common form of cancer (after breast and colorectal cancers) in women in Iran [2].Its incidence is approximately twice among men as compared to women [3], and over 70% of the cases occur in developing countries [1].There is a large geographical GMJ.2019;8:e1223 www.gmj.irdispersion in the incidence of gastric cancer on a global scale [3].Furthermore, there is a large alteration in cancer incidence rate across populations at the lowest and highest risk of gastric cancer [4].Cancer has been considered as one of the leading causes of death worldwide [5].It makes population-based and accurate knowledge of cancer occurrence sorely precious to recognize trends and the risk factors that create those trends [6].The cancer registry data are the main source of data on the burden of cancers by principled registering of the cancer incidence, prevalence, survival, and mortality [7].Nowadays, their work has expanded into the assessment of cancer screening plans and interventions for cancer control.However, the deficiencies in the registering individuals' information, including patient's residence, the primary site of the tumor, date of diagnosis, and date of death [6], make the registered data inaccurate for use in future planning.Most patients prefer to get medical services in the capital of the country or at their neighboring provinces, which are equipped with better medical facilities.Because of the lack of adequate healthcare in their city [8], the patients prefer to register at their neighboring provinces.This is the cause of misclassification.The expected coverage rate of cancer is an indicator of misclassification error in registering cancer incidence, as the expected coverage is reported to be more than 100% in some medical universities and less than 100% in others [9].Two approaches exist to refine misclassification.The first approach is validating a sample of data by rechecking medical records and expanding its results to the target population [10].The second approach for correcting the misclassification error is by using the Bayesian method.In this method, the researcher takes prior evidence into account in the analysis [11] by determining prior distribution on the parameters [12].This study aimed to inquire about the trend of gastric cancer after estimating the misclassification rate in the registry system using the Bayesian method and re-estimating the incidence rate in each province of Iran.

Materials and Methods
Gastric cancer incidence data from 2004 to 2008 were extracted from the National Can-cer Registry (NCR) of Iran, which is published annually by the Ministry of Health (MoH) [9].The NCR collects cancer incidence data by collaborating with medical universities of the country.Each medical university makes a dataset of new cases of cancer, which are certified by pathology centers.The new cases that are collected are entered into a software that is designed by the MoH.In this stage, duplicate cases are removed, and the remaining recorded cancer cases are coded according to the international coding of disease (10th revision).
The MoH sends back the prepared dataset of cancer cases to medical universities.For each medical university, an expected coverage of new cancer cases is calculated, which has been set to 113 per 100,000 population covered by that university.Data were entered into the model in 2 vectors.The first vector contained the age-standardized rate (ASR) for males and females in 4 age groups for the province with less than 100% expected coverage, and the second vector contained the same data for a province with more than 100% of the expected coverage, which is in the neighborhood [13,14].Patients were divided into the following 4 groups: those aged 14 years, 15 to 49 years, 50 to 69 years, and more than 70 years.As vectors y1 and y2 contain count data, Poisson distribution was considered for them [15,16].For the misclassified parameter (θ), which is considered as the probability of recording data in the wrong group, an informative beta prior distribution was assumed.Hence, θ~beta(a,b) [17,18].Prior values for beta parameters (a and b) were selected based on the expected coverage of cancer cases in each province.Expectation of this distribution a/(a+b) converges to the misclassification rate.The misclassified parameter is not a known parameter; hence, a latent variable (U) was applied as the number of cases that in fact belonged to the first group but were wrongly assigned to the second group.A binomial distribution was assumed for the latent variable, that is, U i | θ,y 1 ,y 2 ~Binomial(y i2 ,P i ) , and P i =(λ i1 θ)/(λ i1 θ+λ i2 ), which is the probability of wrong classification in the second group.A sample size of 100,000 is produced from the posterior distribution Beta(∑ i U i +a,∑ i y i1 +b) by Gibbs sampling [19,20,21].Misclassification rate was estimated by averaging the produced sample from the posterior distribution.Analyses were conducted using the R software version 3.2.0.

Results
The

Discussion
There was a remarkable misclassification error with respect to the registration of gastric cancer among adjacent areas in Iran.Besides, there was an increase in gastric cancer incidence during the years considered in this study.This increase was higher in males than in females.Highest rates of estimated misclassifications belonged to more deprived provinces such as Sistan, Hormozgan, South Khorasan, North Khorasan, and Bushehr.Also, there was no significant reduction in misclassification rate during the years considered in this study.It indicates that still sufficient effort is not made to prepare healthcare facilities and improve the registration system in all provinces.The wellknown risk factors for gastric cancer are Helicobacter pylori infections, family history of gastric cancer, and smoking.However, some populations with a high prevalence of H. pylori infection and low rates of gastric cancer show that other factors may also be important [3].Also, the incidence rate among immigrants tends to be similar to those in the country to which they move rather than to those in their country of origin.It can be concluded that environmental factors play a large role in the incidence rates [22,23].Thus, it is anticipated that the incidence of cancer is similar in adjacent regions that are exposed to similar circumstances, but there are major differences in the incidence of gastric cancer, which can be justified by misclassification error in recording domicile of patients that causes overestimation or underestimation in the rate of cancer in neighboring areas.Acquiring knowledge about the diffusion of disease among different communities in different areas is an appropriate method for recognizing the factors that influence disease incidence [24] and quantifying the potentials for disease control and prevention [25].However, usually, spatial analysis is performed based on registered data for finding the geographic pattern of disease and determining high-risk areas.In those types of studies, the existence of misclassification is often ignored.As a result, wrong estimates of risk are achieved in different regions.

Conclusion
Our study indicates that some misclassification exists in registering cancer incidence.As registered data are the basic source for health policymakers to identify high-risk areas that are in need of more healthcare facilities, misclassification error should be accounted and corrected.Otherwise, it affects the need assessments to dedicate the facilities to the provinces and leads to the allocation of fewer facilities to the provinces that in fact are in need of more healthcare facilities.When valid data are not available, the Bayesian method is a fast and cost-effective way to account for and correct regional misclassification error.
registered cases of gastric cancers from 2004 to 2008 in Iran were investigated.The ASR of gastric cancer for females increased from 6.42 per 100,000 populations (1439 persons) in 2004 to 10.00 per 100,000 (2243 persons) in 2008.Similarly, the ASR of gastric cancer for males increased from 7.03 per 100,000 populations (3770 persons) in 2004 to 19.16 per 100,000 (5165 persons) in 2008.The trend of gastric cancer incidence from 2004 to 2008 for both sexes is shown in Figure-1.Among 30 provinces of Iran, the data of 21 provinces were entered into the Bayesian model, two by two.Other nine provinces had a coverage of cancer cases that was almost equal to their expected number of cancer patients; hence, the rates of cancer in those provinces remained unchanged.As an example, the percentage of expected cases for Tehran (the capital of Iran), which is a high-facility province from the perspective of existence of equipped healthcare centers and professional doctors in the central part of the country, was 155.63% in 2008, whereas the Qom, Qazvin, and Markazi provinces that are adjacent to Tehran had just covered 53.9%, 66.3%, and 69.6% of their expected number of new cancer cases, respectively.Thus, Tehran has observed 55.63% more cases than its expected number, and Qom, Qazvin, and Markazi provinces observed fewer cancer cases than their expectation.Expected coverage rates for different provinces of Iran from 2004 to 2008 are based on NCR annuals[9].After performing the Bayesian analysis, 37% misclassification was estimated between Tehran and Qom, 32% misclassification between Tehran and Qazvin, and 43% misclassification between Tehran and Markazi in 2008.Estimated misclassification rates in other provinces are presented in

Table 1 .
Estimated Rate of Misclassification Among Provinces Using Bayesian Method Table-1.The rate of gastric cancer in the studyFigure 1. Trend of gastric cancer for two genders (2004 to 2008) in Iran GMJ.2019;8:e1223 www.gmj.ir